Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File Uploads + media_server app #33

Merged
merged 2 commits into from
Apr 22, 2023
Merged

Conversation

ormsbee
Copy link
Contributor

@ormsbee ormsbee commented Feb 16, 2023

Allow Content data to be stored as FileFields backed by django-storages, and serve those assets via a simple internal app.

I'm afraid I muddled this PR a bit. I needed to fix the dependencies, and I got that muddled in with my actual code changes (though they're in separate commits). I also ended up running black on the Python files I touched, meaning there are some unrelated formatting issues higher up in the files that got caught. I can break this up if it's too much to sort through.

From the commit message:

This alters the data model in the components app in a number of ways in
order to support static assets, along with some refactoring:

* Content has been renamed to RawContent to make it more clear when we
  are talking about "content" as a general concept and the actual data
  model that holds the raw bytes.
* RawContent now uses FileField instead of BinaryField, giving us more
  cheaply scalable storage in exchange for higher latency. This is
  offset by the new TextContent model that will be used to store the
  text versions of RawContent that needs low-latency access (like XBlock
  OLX).
* A primitive media_server app now exists to view static assets during
  development. It is NOT safe to use on a running site yet.

@ormsbee ormsbee changed the title File Uploads with media_server app File Uploads + media_server app Feb 16, 2023
@ormsbee ormsbee self-assigned this Mar 3, 2023
@ormsbee
Copy link
Contributor Author

ormsbee commented Mar 17, 2023

The part I'm struggling with the most here are the asset download permissions. I had that encoded into the RawContent at one point, and then backed it out to the ContentVersionRawContent through model. But thinking on it more, it feels more like it should live outside the versioning workflow altogether–since if you accidentally left something public, you wouldn't want to leave old versions of it public when you lock the current version down.

@bradenmacdonald
Copy link
Contributor

bradenmacdonald commented Mar 21, 2023

To what extent is learning core responsible for permissions?
Are we going with is_public content is CDN cacheable and public, and everything else requires a signed URL, which can be retrieved by (e.g. the LMS) and given to authorized users, but Learning Core doesn't know anything beyond that?

Edit: lol never mind, I see you answered that above.

@ormsbee ormsbee force-pushed the file_upload branch 3 times, most recently from d18dc2d to 3e50b18 Compare April 14, 2023 02:49
@ormsbee ormsbee marked this pull request as ready for review April 14, 2023 02:53
@ormsbee
Copy link
Contributor Author

ormsbee commented Apr 14, 2023

@bradenmacdonald, @kdmccormick, @feanil: Ready for review.

@ormsbee
Copy link
Contributor Author

ormsbee commented Apr 19, 2023

Rebased to resolve some dependency conflicts.

Copy link
Contributor

@feanil feanil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 question about Django otherwise, I don't think I have any more questions/feedback at the moment.

@@ -1,15 +1,6 @@
# Core requirements for using this application
-c constraints.txt

Django # Web application framework
Django<4.0 # Web application framework
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for not using a newer Django here? Because we'll be running it as an edx-platform library eventually?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I haven't set up tox or anything yet to test across multiple versions, so I just wanted to make sure when I'm doing the dev locally that I'm not using any features that don't exist in 3.2.

Copy link
Contributor

@bradenmacdonald bradenmacdonald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some very minor questions/comments. Looks great!

raw_content=raw_content,
text=data_str,
length=len(data_str),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check if len(data_bytes) < MAX_TEXT_LENGTH before trying to save the copy of the OLX into TextContent ? (store it on S3 only, but still allow it) Or do we want to always throw an error for such large OLX?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point I want it to throw an error. Anything that big is probably going to cause issues down the road, and I'm honestly curious where we actually have that in the wild. I suspect in most cases, it's because something unexpected and weird is happening, like copy-pasting from a Word doc and bringing the images over as base64-encoded HTML.

Guidelines
----------

Nothing from ``lib`` or ``core`` should *ever* import from ``contrib``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isolated apps linter could enforce this for you if you wanted down the road :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have some primitive import linting set up here, though I need to actually stop being lazy and hook up the CI to run it properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh awesome :)

raw_content.file.save(
f"{raw_content.learning_package.uuid}/{hash_digest}",
ContentFile(data_bytes),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, at least for smaller OLX content, why to we need to save the file to S3 if we have it in-DB via TextContent? Especially if it's learner_downloadable=False, is there any use case for reading from S3? From the docstring on the models it sounds like this is just for overall consistency, but I'm wondering if there's a use case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly consistency, and I thought it might make export simpler. I also thought that there might be some edge cases where we want to later promote RawContent to TextContent after the fact, and have them exist in both places–like video transcripts. But mostly I just like the idea that everything is minimally represented as an opaque blob, and there can be multiple levels of progressive enhancement of that data over time, e.g. XBlockContent, Video, etc.

openedx_learning/core/components/models.py Outdated Show resolved Hide resolved
# is a part of hasn't started yet. That's a matter of LMS permissions and
# policy that is not intrinsic to the content itself, and exists at a layer
# above this.
learner_downloadable = models.BooleanField(default=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ Thank you for this nice, clear, detailed docstring and field name 💯

openedx_learning/core/components/admin.py Outdated Show resolved Hide resolved
David Ormsbee added 2 commits April 21, 2023 22:39
This alters the data model in the components app in a number of ways in
order to support static assets, along with some refactoring:

* Content has been renamed to RawContent to make it more clear when we
  are talking about "content" as a general concept and the actual data
  model that holds the raw bytes.
* RawContent now uses FileField instead of BinaryField, giving us more
  cheaply scalable storage in exchange for higher latency. This is
  offset by the new TextContent model that will be used to store the
  text versions of RawContent that needs low-latency access (like XBlock
  OLX).
* A primitive media_server app now exists to view static assets during
  development. It is NOT safe to use on a running site yet.
The codecov dependency had to be replaced because it got pulled from
PyPI. The DRF dependency somehow never made it in there before.
@ormsbee ormsbee merged commit 5390dd4 into openedx:main Apr 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

4 participants